112 research outputs found
Confidence sets for split points in decision trees
We investigate the problem of finding confidence sets for split points in
decision trees (CART). Our main results establish the asymptotic distribution
of the least squares estimators and some associated residual sum of squares
statistics in a binary decision tree approximation to a smooth regression
curve. Cube-root asymptotics with nonnormal limit distributions are involved.
We study various confidence sets for the split point, one calibrated using the
subsampling bootstrap, and others calibrated using plug-in estimates of some
nuisance parameters. The performance of the confidence sets is assessed in a
simulation study. A motivation for developing such confidence sets comes from
the problem of phosphorus pollution in the Everglades. Ecologists have
suggested that split points provide a phosphorus threshold at which biological
imbalance occurs, and the lower endpoint of the confidence set may be
interpreted as a level that is protective of the ecosystem. This is illustrated
using data from a Duke University Wetlands Center phosphorus dosing study in
the Everglades.Comment: Published at http://dx.doi.org/10.1214/009053606000001415 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …